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Abstract 

Cuckoo hashing is an efficient technique for creating large hash tables with high space 
utilization and guaranteed constant access times. There, each item can be placed in a 
location given by any one out of k different hash functions. In this paper we investigate 
further the random walk heuristic for inserting in an online fashion new items into the 
hash table. Provided that fc > 3 and that the number of items in the table is below (but 
arbitrarily close) to the theoretically achievable load threshold, we show a polylogarithmic 



■ bound for the maximum insertion time that holds with high probability. 



1 Introduction 

X, 

^ I Hash tables are widely used in applications that need efficient data structures for large sets 

of data. A key issue in the use of hash-tables is the handling of collisions. A technique 
that attracted quite a bit of attention in recent years is the so-called cuckoo hashing, cf. e.g. 
[21 [T3l [T9l [23] and the references therein, that is based upon the paradigm of the power of 
many choices [H [20] . 

The term cuckoo hashing was coined by Pagh and Rodler in [23]. In the present work 
we will consider a slight variation of it, as defined in [T3|. We are given a table T with n 
locations, and we assume that each location can hold only one item. Further generalizations 
where two or more items can be stored have also been studied, see e.g. [21 [3 [12] j but we will 
not treat those cases. Moreover, we assume that we have k > 2 hash functions hi,...,hk 
that each maps an element x of a universe U of items to a position in the table T. More 
precisely, we assume that hi, . . . ,hk are independent truly random functions hi : U ^ T. 
This assumption is somehow idealized, as exponentially many bits would be needed to store 
such truly random functions. However, there is theoretical evidence that even "simple" hash 
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functions can be sufficient in practice, provided that the underlying data stream fulfills certain 
natural conditions; we refer the reader to the papers [21] and [8], and the references therein. 

In his recent survey [K] Mitzenmacher outlined several open problems that remain to be 
solved in order to better understand the power of cuckoo hashing from a theoretical point of 
view. Among these were the issue of space utilization and the search for good upper bounds 
for the time needed to insert new elements. The first point was recently solved independently 
by the first two authors [H] , as well as Frieze and Melsted [15] (for A; > 4) and Dietzfelbinger 
et al. [?]■ The aim of this paper is to address the second point. Before we can state our results 
formally, we need to outline the results from [Tj [Mj [TS] in more detail. 

A natural question in cuckoo hashing is the following. Let us denote hy I C U the set 
of available items. As |/| increases, it becomes more and more unlikely that all of them can 
be inserted into the table so that each item is assigned to one of its k desired locations. In 
other words, if |/| is "small" compared to |T|, then with high probability, there is such an 
assignment to the locations in the table that respects the k choices of the items. On the other 
hand, if |/| becomes "large", then such an assignment does not exist with high probability 
(trivially, this happens at the latest when n + 1 items are available) . The important question 
is whether there is a critical size for / where the probability for the existence of a valid 
assignment drops instantaneously in the limiting case from 1 to 0, i.e., whether there is a load 
threshold for cuckoo hashing. More precisely, we say that a value c|. is the load threshold for 
cuckoo hashing with k choices for each element if 

jp / there is an assignment of [cnj items to a table with \ n->oo |l, ifc<c^, (n) 
y n locations that respects the choices of all items J | g, if c > 

For the case k = 2, where each item has two preferred random locations, there is a natural 
connection with random graphs: we think of the n locations of T as the vertices of the graph, 
and of the items as edges, which encode the two choices for each item. If l^l = m, what we 
obtain is the Erdos-Renyi random (multi-)graph G* ^. The properties of this random graph 
are essentially those of the random graph Gn,m on n vertices and m distinct random edges. 
Moreover, it easy to see by applying Hall's Theorem that Gn,m has no subgraph with more 
edges than vertices if and only if the corresponding items can be assigned to the corresponding 
locations such that the choices of all items are respected. It is well-known that the property 
"Gn,™ has a subgraph with more edges than vertices" coincides with the appearance of a 
giant connected component that contains a linear fraction of the vertices; see e.g. [T7]. As 
the latter is known to happen around m = n/2, we readily obtain that the load threshold for 
cuckoo hashing and A: = 2 is at C2 = 1/2. In other words, at most half of the table can be 
filled in a way that respects the choices of all items. 

For k > 3 recently results were obtained independently by the first two authors [14] . as 
well as Frieze and Melsted jl5j (for A; > 4) and Dietzfelbinger et al. [7]. 

Theorem 1.1. For any integer k > 3 let ^* be the unique solution of the equation 

r(i-e-^') 

1 - e-€* - ^*e-?* 

Then = is the load threshold for cuckoo hashing with k choices per element. 

In particular, if there are [cn\ items, then the following holds with probability 1 — o(l). 

2 
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1. If c < c\, then there is an assignment of the items to a table with n locations that 
respects the choices of all items. 

2. If c> c^,, then such an assignment does not exist. 

Numerically we obtain for example that C3 = 0.917, c| = 0.976 and C5 = 0.992, where "=" 
indicates that the values are truncated to the last digit shown. Moreover, a simple calculation 
reveals that = 1 — + o{e~^) for k — )■ 00; see |14j . 

Note that Theorem 11.11 is non-algorithmic: it just states that whenever the load of the 
hash-table is below the threshold then there exists with probability 1 — o(l) an assignment 
of the items that respects the choices of all items. The theorem does not, however, address 
the question whether we can actually find such an assignment efficiently. This is the problem 
that we address in this paper. 

More precisely, our main aim in this paper is to bound the time needed to insert an 
additional item into the table, assuming that some given number of items have already been 
inserted properly. A natural insertion strategy is to use a randomized approach: insert the 
new item randomly at one of its locations, if the location is currently occupied, kick the item 
out and reinsert it randomly at one of its other k—1 locations; recurse if necessary (see below 
for a more formal statement of this algorithm). In [16] Frieze et al. studied this algorithm 
and showed that the running time is polylogarithmic with high probability provided k > 8 
and the load of the hash-table is not too close to the threshold c^. 

The main theorem of our paper states that the random insertion algorithm actually suc- 
ceeds in polylogarithmic time with high probability for any number of inserted items arbi- 
trarily close to the load threshold and for all A; > 3. 

Theorem 1.2. For k > 3 let a := ■ For any set of m = (1 — £)c*f^n items of the 

universe U , where e £ (0, 1), if the hash functions hi, ... ^h^ are random, then for any C > 
sufficiently small with probability 1 — o(l) each of the items will be inserted into a table with n 
positions in time 0(log^~'''^^'' n). 

In fact, we show the following slightly stronger statements: (i) if m = (1 — e)c^7^ elements 
are inserted into a hash table with n positions, for some e > 0, then with probability 1 — o(l) 
the positions determined by the hash functions satisfy certain 'nice' structural properties, and 
(ii) if m elements satisfy these 'nice' properties then an additional element can be inserted 
in time ©(log^^"^^^ n) with probability 1 — 0(n~^~''/^). Observe that c = 2.66 for k = 3, 
c = 1.54 for k = 4: and c = 1.15 for k = 5. Moreover, c = + O(^) as k grows. Our 
exponent in the bound of the running time thus compares very favorably with that from [T6j . 

2 The Insertion Algorithm and its Analysis 
2.1 Random Walk Insertion 

Roughly speaking, the insertion procedure that we study works as follows. Assume that m 
items have been inserted and we are about to insert the (m+l)st item. This item is assigned k 
random positions from the hash table and sits on one of them. However, this position might 
already be occupied by a previously inserted item. If this is the case, then this item is 
kicked out and sits on one of the other k — 1 selected positions. In turn, this position might 
be occupied by another item, which is kicked out and goes to one of its remaining k — 1 
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positions. This process may be repeated indefinitely or until a free position is found. If each 
selection takes place uniformly at random among the k — 1 available positions, this is a random 
walk on the positions of the table (or, as we shall shortly see, on the vertex set of a suitably 
defined hypergraph). Thus, it is called the random walk insertion. 

Let us now be more formal. We assume that there are k > 3 hash functions hi, . . . ,hk 
which map a universe J7 to a hash table T with n positions. We denote by T(i) the contents 
of the ith position of T, and we write T{i) = if the ith position is empty. Also, if e is an 
item that has been inserted into the table, we denote by H{e) the index of the hash function 
that e currently uses. With these definitions available, we are able to describe formally the 
insertion algorithm. 



Table 1: The random- walk insertion algorithm 

1 sue ^ FALSE; 

2 e^e„i+i; 

3 j ^ 0; 

4 repeat 

5 Choose uniformly at random i € {1, . . . ,k} \ j; 

6 H{e) ^ i; 

7 if T(/ii(e)) ^0 then 



The analysis of this algorithm reduces the allocation of elements to a hypergraph setting. 
The hash table of size n corresponds to a set of n vertices, and the locations of each item 
correspond to a hyperedge of size at most k. As we assume that the hash functions are 
truly random, a set of m elements gives rise to a fc-uniform random hypergraph with n 
vertices and m edges each chosen uniformly at random (and with replacement) among all k- 
multisubsets of the vertex set. Thus, the random walk on the positions of the hash table, which 
is induced by the insertion algorithm, naturally gives rise to a random walk on the vertex set 
of this hypergraph. The techniques that we use to prove Theorem 11.21 are a combination of 
the approach of Frieze et al. |16] together with strong structural properties of such a random 
hypergraph. 

2.2 Random Hash Functions and Random Hypergraphs 

Assume that we have already inserted m elements which have been allocated among the n 
positions of the hash table, using the random hash functions hi, . . . ,hk- This translated into 
a random hypergraph setting means that a random (multi)hypergraph ^ ^ on the vertex 
set Vn '■= {1, . . . ,n} has been created, where each of the m edges is an ordered /c-tuple of 
elements of Vn chosen with probability with replacement. Each edge corresponds to 
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else 



T(/i,(e)) ^ e; 
sue ^ TRUE; 



endif 
until sue = TRUE 
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the k choices of an item. Note that in this definition of i?* ^ ^ we actually interpret the word 
"multi" in two ways: (i) an edge has size k, but may contain a particular vertex several times, 
and (ii) the edge set of H^^^ may be a multiset, i.e., a particular edge can occur multiple 
times. 

With slight abuse of terminology, we say that a multi-hypergraph H = H{V, E) is a k- 
graph if it is A:-bounded, that is, every hyperedge is a subset of V with at most k vertices. 
Observe, that the random hypergraph H* ^ ^ corresponds in a natural way to a A:-graph, by 
projecting each ordered fc-tuple of vertices that forms an edge in H* ^ ^ to the set of vertices 
contained in this fc-tuple. In what follows, we will be using the symbol i?* m fc denote both 
objects; each time the interpretation should be clear from the context. 

2.3 Orientations and the /i-neighborhood of a Vertex 

For a fc-graph H = H(V,E) with vertex set V and edge set E where \E\ < \V\ an injective 
mapping h : E ^ V oi the set of edges into the set of vertices of H such that each edge is 
mapped to one of its vertices is called an orientation of the edges. In the setting of cuckoo 
hashing, an orientation corresponds to an assignment of the items to the hash table. 

Assume that H = H(y,E) is a /c-graph with \E\ < \V\ and let h be an orientation of E. 
For V € V we define its first h- neighborhood to be the set of vertices apart from v itself that 
belong to the edge which is oriented to v. More generally, having defined the (t — l)st h- 
neighborhood, for t > 1, the tih /i- neighborhood of v is the set of vertices which belong to the 
edges that are oriented to the vertices of the {t — l)st /i-neighborhood of v and belong neither 
to it nor to any one of the previous neighborhoods. Also, we define the 0th /i-neighborhood 
of w to be u itself. Note that for any t > 1, the tth neighborhood may contain at most {k — 1)* 
vertices. Also, we denote by Nh^t{v) the number of vertices that are within /i-distance t from v 
and Mh^tiy) the set of these vertices. Thus, 

Nh,t{v)<ik-lf^\ (2.1) 

If f is a vertex of H and S a subset of V{H)^ then the /i-distance of S from v is c?/i(u, S) := 
min{t : Nh,t{v) Pi 5 ^ 0}. Finally, if h is an orientation, we denote by the set of free 
vertices, that is, vertices to which no edge has been oriented. We call the remaining vertices 
occupied. 

The insertion algorithm can be viewed as a random walk on the vertex set of the cor- 
responding fc-uniform hypergraph. Note that every (proper) assignment of the elements to 
positions in the hash table corresponds to an orientation in the associated hypergraph and 
vice versa. If we want to stress that the random walk starts with a particular assignment, 
that corresponds to an orientation h, we also speak of an h-random walk on the vertex set of 
the hypergraph H = H(y, E). 

2.4 Proof of Theorem II. 2t Analysis of the Insertion Algorithm 

Let us fix an e > and assume that m = (1 — e)c^ items have been allocated. The correspond- 
ing hypergraph is distributed as ff* ^ Let us fix the realization of H*^ ^ ^ and also let h be 
the orientation on the edges of i7* ^ ^ induced by the allocation of the inserted items. We 
will bound the running time of the /i-random walk performed on this particular realization 
of under the assumption that the latter satisfies some high probability properties 

which will be stated explicitly below. 
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Now, suppose that the new element Cm+i is initially inserted into vertex v £ Vn and 
assume that v is occupied for otherwise we are done. Following Frieze et al. [16], we consider 
a decomposition of the vertex set Vn into two sets according to the /i-distance of from 
each vertex. In particular, for some C > 0, we let 5 C be the set of vertices v £ Vn such 
that dfi{v, Ffi) < C and let BheVn\S. In the work of Frieze et al. [IS], the parameter C was 
of order log log ?i. In the present work, we let 5 be the set of vertices that have /i-distance at 
most C from F^, for some suitable constant C. We show that there is a constant C = C(e) 
such that S covers almost all of the hypergraph. 

Note that, if w G 5 then the definition of S implies that there is at least one free vertex 
within /i-distance C from and the /i-random walk will thus hit a free vertex with probability 
at least \/{k — within the next C steps. 

In order to treat the case v ^ S we will first show that certain expansion properties 
of H^^f^ (that hold with probability 1 — o(l)) guarantee that the /i-neighborhood up to 
^-distance roughly logi._in from v grows almost like a {k — l)-regular hypertree. This, in 
turn, will then allow us to show that for vertices v ^ S the /i-random walk will hit S with 
reasonably high probability after a logarithmic number of steps. 

Our plan for the proof of Theorem 11.21 is thus as follows. In the next two subsections 
we define two properties of fc-graphs, a density and an expansion property, and show that a 
random hypergraph ^ has these properties with probability 1 — o(l). We also show that 
the density property implies that the set S is large, and that the expansion property implies 
that a random walk starting in a vertex not in S will hit S within a logarithmic number of 
steps with high probability. In Section 12.4.31 we then formally show how these two properties 
conclude the proof of Theorem 11.21 

2.4.1 Density Properties 

Assume that H = H{V, E) is a fc-graph and for every subset y' C y we denote by EiV') 
the set of edges of H induced on V' . Additionally, we set eiV') := \E{V')\. We define the 
following property. 

Property Dg: There exists a 5 > such that for all non-empty V '^V we have 

e{V') < {l-6)\V'\. 

The next proposition states that if H has Property Dg, then most of the vertices are such 
that Fh is within bounded /i-distance from them. 

Proposition 2.1. // a k-graph H = H(y,E) has Property Ds and h is an orientation of 
E[H), then for all a > there exists C = C(a,5) > and a set S ^ V of size at least 
(1 — oi)\V\ with the property that for every v £ S we have dh{v, Fh) < C. 

We defer the proof to Section [3l The next theorem states that ,^ ^ has Property with 
high probability for some suitable 6. More precisely, the following holds. 

Theorem 2.2. Let e > and suppose that m = (1 — e)c|.. There exists a 5 = 6{£,k) > 
such that H*„if, has property Ds with probability 1 — 0(l/n). 

We prove this theorem in Section [H Together with Proposition 12.11 this gives us a statement 
about the typical structure of if* ^ ^. 
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Corollary 2.3. Let e > and set m = (1 — £)d^n. Then Hnmk following property 

with probability 1 — 0(l/n). For every a > there is a C = C(a,e) > such that for every 
orientation h there exists a set S Vn such that \S\ > (1 — a)n and every v £ S satisfies 
dh{v,Fh)<C. 

2.4.2 Expansion Properties 

For a set of edges E' of a hypergraph H = H{V, E), we denote by V{E') the set of vertices 
contained in edges of E' . We say that a /c-graph H = H{V, E) has expansion property if it 
satisfies the following two conditions: 

Property E: 

1. For ah E' (IE with log log \V\ < \E'\ < \ V\/k we have 

\V{E')\ >{k-l- where = tl%/s)-l 

2. For all E' C E with \E'\ < log log |y| we have 

\V{E')\>{k-l)\E'\. 

Note that the choice of the parameters in this definition is somehow arbitrary and not best 
possible. Nevertheless, they suffice for our arguments. In Section [H we show that -ff* has 
Property £ with high probability. 

Proposition 2.4. If m < cpi, then Et^mk Property E with probability 1 — o(l). 

Given an orientation /i of a hypergraph and a vertex v, recall that Nh^tiv) is the number 
of vertices within /i-distance at most t from v. In Section [5] we show the following. 

Lemma 2.5. For k > 3, let H = H{V,E) be a k-graph on n vertices which has Property 8, 
and let h be any orientation of its edges. Then, for any a > small enough and for any C > 
with T = logy^.„;^ n + (c + C) logfc_i logfc_i n and c as in Theorem \1.S\ the following holds for 
n sufficiently large. If v £ V is such that there is no free vertex within h-distance T from v, 
then Nh^T{v) > can. 

Note that Lemma 12.51 only handles the case where there are no free vertices within h- 
distance T from a given vertex v. Intuitively it seems obvious that this case should dominate 
the running time of the insertion algorithm. To make this formal, we need an additional 
definition. Given a hypergraph H = II{V, E) and an orientation h of its edges, we define 
an auxiliary hypergraph H = H{V',E') by replacing every free vertex of i^ by a (/c — 1)- 
regular hypertree of depth T (on a new set of vertices) rooted at this vertex. We extend the 
orientation of the edges of H to the edges of H by orienting each new edge towards the root 
of its tree. Thus the leaves of each such tree are the free vertices of H. As we will see in 
Section O Proposition 12.41 together with Lemma [23] imply also a good neighborhood growth 
for H*^„^ i^. 

Corollary 2.6. For every k > 3, let H = H[V, E) be a k-graph on n vertices which has 
Property £, let h be any orientation of its edges, and let H = H[V' , E') and h' be as defined 
above. Then, for any a > small enough and for any ^ > with T = log;j_;^ n + (c + 
C,) log;j_;^ log;._;^ n and c as in Theorem \l.S\ the following holds for n sufficiently large. For all 
V £ V we have N^^Ti'^) > on. 
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2.4.3 Proof of Theorem (172] 

We can now put together Coroharv 1 2 . 3 1 and Coroharv 12.61 and derive a high probabihty bound 
on the running time of the random walk insertion algorithm. Assume that m = (1 — £)c*j^n. 
From Theorem 12.21 we know that there exists & 5 = 5{£,k) > such that H^^j^ satisfies 
Property Ds with probability 1 — o(l). From Proposition 12.41 we also know that H^^^f. 
satisfies Property £ with probability 1 — o(l). We thus may assume that we begin with 
a certain realization of H^^/^, which has both properties. For convenience, we call this 
hypergraph H. Fix an orientation h of its edges. Thereafter, choose a small enough so that 
Corollary 12.61 can be applied and set C = C{a,e) as in Corollary 12.31 Note that this also 
specifies the set S. 

Lemma 2.7. Assume that H , h, and C are as specified above and, for C > 0, let T = T{(^) 
and c be defined as in Corollary \2.(k Then the probability that an h-random walk starting at 
a vertex hits a free vertex within T + C steps is at least ,, — -^t — . 

Proof. Let us consider the following stopping rule. Starting from a vertex vq, we walk either 
for T steps or until we hit a free vertex, whatever occurs earlier. If the latter is not the case, 
then we walk for C additional steps or until we hit a free vertex. We consider the same h- 
random walk on that is, the /i-random walk there starts at vq and makes the same random 
choices as the one in H (and new ones, if the random walk on H stopped at a free vertex). 
Let u be the vertex that has been reached after T steps. The growth property guaranteed 
by Corollary 12.61 implies that there are at least 2an — an = an vertices within /i-distance T 
from V{) in H that either belong to S or to one of the trees we added to H. The probability 
of hitting such a vertex after T steps is at least jj^^- If li 5, then u belongs to one of 
the trees that we added to ff, and we conclude that the /i-random walk on H has stopped 
before reaching T steps. If ti E 5", then we stop the /i-random walk in H but we continue it 
in H for another C steps or until a free vertex is found. The probability that this part of 
the stopping rule ends up at a free vertex is at least l/(fc — l)*"", since the assumption that u 
belongs to S implies that there is at least one free vertex within /i-distance C from u. Thus 
the probability of success is at least (fc^^^^+c? ■, thus concluding the proof. □ 

To conclude the proof of Theorem 11.21 we split the /i-random walk into phases, where a 
phase is either a window of duration T + C or until a free vertex was hit. We repeatedly 
use the above lemma to bound the number of unsuccessful phases. More precisely, if the 
first phase is unsuccessful, then the above analysis applies with the starting vertex being the 
vertex in which the previous phase ended, which we call vi, and with a new orientation hi. 



C+1+2C 



The above arguments imply that for any C > the probability that at least log^_^ 



n 



phases are unsuccessful given that e^+i is inserted into vq is 0{l/n^~^^). As each phase lasts 
for T + C = O(logn) steps, we deduce that with probability 1 — 0{l/n^~^^), the random walk 
inserts e^+i within 0(log^^'^^^^ n) steps. As the total number of inserted elements is 0{n) 
this concludes the proof of Theorem II. 2[ 



3 Proof of Proposition I2.lt /i-neighborhoods and the Density 
of a Hypergraph 

The main idea of the proof is as follows. As H has Property we know that \Fh\ = 
\V\ — \E\ > 5\V\. Consider an edge e that contains a vertex from F^. The orientation h 
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assigns this edge to some vertex Ve € e. Suppose that we remove ah free vertices and all edges 
that contain a free vertex. As the removal of any edge e that contains a free vertex generates 
a new free vertex {v^), this process generates a subhypergraph of H, which is induced by 
V\Fh, with a new set of free vertices. For this subhypergraph we can again use Property Ds 
to deduce that the number of free vertices is at least a 5- fraction of the number of vertices. 
We repeat this stripping until we are left with less than a\V\ vertices - and show that a 
constant number of rounds suffice. 

Let us now be more formal. Let Fq := and Lq := V , and define inductively L^+i = 
Li \ Fi and let Fj+i be the set of free vertices in the hypergraph induced by Lj+i. Since H 
has Property Dg, it follows that \Fq\ > S\V\ = d\Lo\. We claim that for all i > we have 
l-Pi+il ^ (^l-^j+il as well. The two crucial observations are: 

(i) = e{Li), as Lj+i contains exactly the vertices that are the images (under h) of 
the edges in the hypergraph induced by Lj. 

(ii) l-Fi+il = e(Lj) — e(Lj_|_i), as every edge that belongs to the hypergraph induced by Lj 
but not to the one induced by Lj+i generates exactly one free vertex in -Fj+i. 

As H has Property Dg we also know that e(Lj+i) < (1 — 5)|Lj+i|. Hence, 

(n) (i) 

= e{Li) - e(Lj+i) > e{Li) - (1 - S)\Li+i\ = 6\Li+i\. 

Thus 

|Lj+i| = \Li\ - \Fi\ < (1 - 6)\Li\ for all i > 0. 

To conclude the proof observe that this implies that for all t > 1 

\Lt\ < {l-dY\Lo\ = {l-d)'\V\. 

Thereby, choosing t = [log;^_5 a] we deduce that \Lt\ < a\V\. Thus, we may take S := V \ Lt 
and C(q,(5) = [log]^„5a]. 

4 Proof of Theorem 12. 2t the Subgraph Density of ^ ^ 

The core of a hypergraph is its maximum subhypergraph of minimum degree at least 2. Of 
course the core of a hypergraph might be empty (that is, the null hypergraph). We will be 
denoting the core of a hypergraph H by C{H). A standard algorithm that reveals the core 
of a given hypergraph is the so-called stripping process. During this process, we repeatedly 
choose a vertex of degree 1 and we make it isolated by deleting the edge that this is incident 
to. This process stops when there are no more available vertices of degree 1 and what remains 
is either the empty hypergraph, if the core is empty, or otherwise the core itself together with 
a (possibly empty) collection of isolated vertices. 

To prove Theorem 12.21 we will first show a lemma stating that any given hypergraph 
with good expansion properties (as in Proposition 12. 4p and, in addition, whose core has 
Property Dg, also has Property Dg' for some 5' = 6' {5, k). 

Lemma 4.1. Let H = H{V,E) be a k-graph where every edge contains at least two vertices 
such that 
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1. C{H) has Property Dg for some < S < 1/4; 



2. H has Property E . 
Then there exists G (0, 1) such that H itself has Property D^g. 

Proof. For any subset S of V, we denote by es and vs the number of edges and vertices, 
respectively, in S. We will show that for any S C.V we have < (1 — C'5)^5- Towards this 
goal we will make a case distinction depending on the number of edges that are induced by S. 

The first part of Property £ implies that for all /3 G (0, 1) there exists a < 7 < such 
that for ah E' (1 E with log log \V\ < \E'\ < 7|y| we have 

\ViE')\>ik-l-/3)\E'\. 

Setting f3 = 1/2, let 7 = 7(1/2) be as above. Let us start with the case < 'y\V\. The 
bound on the density of S can be deduced right away from Property £. Indeed, by using the 
first part of Property £, if eg > log log \V\, then vs > {k — I — l/2)es, yielding 

- < 1 ^ < -• (4.1) 

Now, using the second part of Property £, if es < loglog|y|, then vs > {k — l)es- A 
rearrangement yields 

^<— ^ < -. (4.2) 

vs ~ k-1 - 2 ^ ^ 

Suppose now that 65 > 'y\V\. We will make a further case distinction depending on the number 
of edges that belong to the core of H[S]. First, let us assume that there are at least 7|F|/2 
edges of H[S] that do not belong to the core of S. To avoid unnecessary complications, let us 
assume that H[S] is connected; clearly it is enough to argue about connected sets. Consider 
the stripping process on ^^[5"]. This induces an ordering on the set of vertices of H[S] not 
belonging to the core. Namely, it is the ordering according to which these vertices are stripped 
off. Moreover, it might happen that the deletion of a vertex and the edge it is contained to 
makes some other vertices isolated. We put these vertices immediately after the deleted vertex 
in an arbitrary ordering. 

Let vi,...,vt be this ordering for an appropriate integer t > 1. Note that whenever 
we delete a vertex of degree one during the stripping process, this is accompanied by an 
edge that is deleted too, that is, the edge that this vertex is contained to. Each one of 
the remaining k — 1 vertices of this edge either belongs to the core of H[S] or it is one 
of the vertices that comes after the deleted vertex in the above ordering. Now, consider 
the ordering in reverse and let i be the minimum index such that the number of edges 
that contain Vi,. . . ,vt is [7|^|/2j; there are s := t — i + 1 vertices there. Assuming that 
among them there are x vertices which became isolated during the stripping process, we 
have s = [7|^|/2j + x. The first part of Property £ implies that the number of vertices 
that are contained in these edges is at least (A; — 1 — /?)[7|y|/2j. Among these vertices at 
least [k — I — /3)['y\V\/2\ — s = {k — 2 — fi)y^\V\/2\ — x vertices must belong to the core 
of ff[5']. Let So denote these vertices and let S\ denote the remaining vertices of the core 
of H[S]. In other words, |So| > {k — 2 — (3) ['j\V\/2\ — x. As the core of H[S] is a subgraph of 
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the core of Property Ds implies that the core of H[S] contains at most (1 — 5)(|5o| + |5i|) 
edges. Now we can write an upper bound on the density of S. We have 

es ^ t-x + {l-5){\So\ + \Si\) ^ X ^ |5o| + |5i| 

< TTT-. TTT-. = i TTT-. TTTT " 0" 



vs~ t+|5o| + |5i| t + |5o| + |5i| t + |So| + |5i| 

Using that t + jS'ol + jSil < we infer that 
es,. X J5ol ^ X {k-2-(3)HV\/2\-x ^ ^ (^k - 2 - P)[^\V\/2\ 

^- M" M- M" W\ W\ • 

So, [-i\V\/2\ > j\V\/A imphes that 

|<l-^ '*-^;^;'^"^l =l-^(t-2-^h/4. (4.3) 

Finally, assume that less than 7|y|/2 edges of H[S] do not belong to the core of H[S]. 
With ec(5) and fc(5) denoting the number of edges and vertices of the core of H[S], respec- 
tively. Property Ds of the core of H implies that 

{l-5)vcis)>ecis)>l\V\/2. (4.4) 

Thus 

£5 ^ t + ec(S) ^ t + (1 - d)vc(s) ^ ^ _ ^ VcjS) ^ ^ _ ^ l\V\ ^ ^ _ ^ 7 
vs~t + vc{s)~ i + vcis) i + vc{S) ~ 2(l-5)|y| 2(1 -(5)' 

Together with (|4.ip - (|4.3p the above inequality determines the value of C- Taking 

. [ 7 (fc-2- 1/2)7 ) 

^^=^^n^a^' — 4 — j 

suffices. □ 
The main ingredient in our proof is statement about the subgraphs of the core itself. 



Theorem 4.2. Let e > and suppose that m = (1 — e)c^n, where is given in Theorem ] 
Then, for sufficiently small e, the core of H* „^ ^ has Property D^s /2 with probability 1 — o(l). 

The above theorem together with Proposition 12.41 and Lemma |4 . 1 1 vield Theorem 12.21 In 
the remainder of this section we prove Theorem 14.21 



4.1 Models of Random Hypergraphs 

Theorem 14.21 is stated for the if* ^ ^ model, where multiple edges are allowed, and also each 
edge can contain a vertex more than once. We start by arguing that it suffices to consider 
a slightly different random graph model. Let Hn,m,k denote a random hypergraph that is 
created by selecting m edges with k different vertices in each edge without replacement. 
Then the following is true. 

Proposition 4.3. Let k > 3 and e > be sufficiently small. Assume that m = \_cn\, for 
somec > 0. Then, if^{Hn^m,k has property D^s) = 1— o(l), then¥{H^^^ has property D^j,i2) = 
1 — o(l) as well. 
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Proof. First of all, recall that Proposition 12.41 implies that H*^^/^ has Property S with prob- 
ability 1 — 0(1). Therefore, sets of size at most jn, for some sufficiently small 7 > 0, do not 
violate Property -0^3/2 • So it is sufficient to argue only about sets with at least 7n vertices. 

Let us call an edge in H* ^ ^ bad if it either has repeated vertices or if there is another edge 
that contains exactly the same vertices. For each of these edges we resample new edges until 
the resulting hypergraph contains m different edges with k distinct vertices in each. Note 
that this process yields Hn^m,k- A trivial calculation reveals that with probability 1 — o(l) 
the random hypergraph H*^j^ has at most 21ogn bad edges. 

In the above process, sets with at least jn vertices may change their number of edges 
by at most 21ogn. Thus, for n large enough, if Hn^m,k has the property and there are 
at most 21ogn bad edges, then -f^*^^ must have property D^iji- The statement of the 
proposition now follows. □ 

Thus, proving Theorem 14.21 for Hn.m,k (where we use e'^ instead of e^/2) is sufficient. Our 
remaining proof strategy is inspired by the ideas in [14j. We will use the following auxiliary 
statements about binomial coefficients. 

Proposition 4.4. Let H{x) = —xlogx — {l — x)log{l — x) denote the entropy function. Then, 
for any < a,6 < 1 

n\ 1 + 0(1) / n \ ^ / ?^ \ „51og(max{l/a,l/(l-a)}) 



anj Y^27ra(l - a)n \an±5nj \an 

Proof. The first statement is well-known and follows immediately from Stirling's approxima- 
tion of the factorial function; we omit the details. To see the second statement, let us first 
consider the case with the "+" . We can assume that a + 6 < 1, as otherwise the statement 
holds trivially. Then 

ian+Sn) ^ TT n - an - i + 1 ^ / (1 - a)re y" ^ ^n5log{l/a) 

( ^) an + i ~ \ an I ~ 

\anj i=l ^ ^ 

Similarly, for the case with the "— " we obtain 

I n \ Sn-l „ _ / _ „ \ 5n 

\an—&n) 



tn-6nJ ^ TT ~ ^ / \ ^ n<5 log(l/(l-a)) 

(n\ IL n-an + i + 1 - \(l-a)nj " 



□ 



For the sake of convenience we will carry out our calculations in the Hn.p.k model of ran- 
dom fe- graphs. This is the "higher-dimensional" analogue of the well-studied Gn,p model, 
where, given n> k vertices, we include each /c-tuple of vertices with probability p, indepen- 
dently of every other A:-tuple. Standard arguments show that if we adjust p suitably, then 
the Hn^p^k is essentially equivalent to Hn.cn,k- The following proposition makes this more 
precise. 

Proposition 4.5. Let V be any property of hypergraphs, and let p = ck/ (^Zi); where c > 0. 
Then 

IP iHn,cn,k ^V)< O(V^) • P (i?„,p,fc V) . 
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Proof. Let N = (^), and note that pN = cn. Hence, 

P(F„,, has cn edges) = ( ^)p^^^{l - p)^-^- ^'"'""^^ e(n-V2). 

\cn J 

The claim then fohows from 



' {Hn,cn,k ^V) =¥ [Hn^p^k ^ V I ^^n,p,fc has cn edges) < 



' {Hn,p,k has cn edges) 

□ 



In order to prove Theorem 14. 21 it is therefore sufficient to show that the core of Hn.p.k has 
Property D^a with probabihty 1 — o(n~^/^). This is accomphshed in the next sections. 



4.2 Working on the Core of Hn^p±'- the Cloning Model 

Recall that the core of a hypergraph is its maximum subgraph that has minimum degree (at 
least) 2. At this point we introduce the main tool for our analysis. The cloning model with 
parameters {N,D,k), where > 1 and D > are random variables taking integral values, 
is defined as follows. We generate a graph in three stages. 

1. We expose the value of N. 

2. We expose the degrees d = (di, . . . ,(1^), where the dj's are independent identically 
distributed as D. 

3. For each 1 < w < A we generate dy copies, which we call v-clones or simply clones. 
Then we choose uniformly at random a matching from all perfect /c-matchings on the 
set of all clones. Note that such a matching may not exist - in this case we choose a 
random matching that leaves less than k clones unmatched. Finally, we construct the 
graph -ffd.fc by contracting the clones to vertices, i.e., by projecting the clones of v onto 
V itself for every 1 < v < N . 

Note that the last stage in the above procedure is equivalent to the configuration model H^ k 
for random hypergraphs with degree sequence d = (di, . . . ,(i„). In other words, H^ k is a 
random multigraph where the ith vertex has degree dj. 

One special instantiation of the cloning model is the so-called Poisson cloning model H^.p^k 
for A;-graphs with n vertices and parameter p G [0,1], which was introduced by Kim |18j . 
There, we choose N = n with probability 1, and the distribution D is the Poisson distribution 
with parameter A := p(^Zj). Note that here D is essentially the vertex degree distribution in 
the binomial random graph Hn^p^k^ so we would expect that the two models behave similarly. 
The following statement confirms this, and is implied by Theorem 1.1 in |18] . 

Theorem 4.6. Let k > 2 and suppose that p = G(n~^'^"^). Then there is a C > such that 
for any property V of k- graphs 

P {Hn,p,k ^V)<CF {Hr,,p,k V) + e-^ 
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One big advantage of the Poisson cloning model is that it provides a very precise descrip- 
tion of the core. In particular, Theorem 6.2 in [18] implies the following statement, where we 
write "x ± y" for the interval of numbers {x — y,x + y). 

Theorem 4.7. Let := min2:>o j^z^^^^y^' < (5 < 1, and c be such that ck = p(^lj) > A^. 
Moreover, let x be the largest solution of the equation x = {1 — g-^'c's^fc-i^ qyk^ g^f ^ ~ xck. 
Then the following is true with probability 1 — n~^^^\ If ^2 denotes the number of vertices 
in the core of H^^p^k) then 

iV2 = (1 - - Ce"^)n ± 6n. 

Furthermore, the core itself is distributed like the cloning model {N2, Po>2{^c,k), k), where 
Po>2(Ac^fc) denotes a Poisson random variable conditioned on being at least 2 and parame- 
ter Ac^k, where kc^k = ? + /9, for some |/3| < 5. 

We shall say that a random variable is a 2-truncated Poisson variable, if it is distributed 
like a Poisson variable, conditioned on being at least 2. The next statement is taken from [141 
Corollary 3.4]. 

Corollary 4.8. Let 5 > 0. Let N2 and M2 denote the number of vertices and edges in the 
core of Hn,p,k, where p = cfc/(^~-^) and ck > Xk, where Xk is defined in Theorem \4- ?[ Then, 
with probability 1 — n~'^^^\ 

£(1 — e~^) 

N2 = (I - e~^ - Ce~^)n ± 6n and M2 = —-^ — ^— -iV2 ± 5n, 

k(1 — — 4e~?) 



where ^ = xck and x is the largest solution of the equation x = (1 — e' 



-xck\k—l 



In the following we will collect a few basic properties of the relation of the number of 
vertices and edges in the core of i^n,p,fc- First of all, define the functions 

and recall that c*^ and ^* in Theorem 11.11 and also Theorem 14.21 are given by the solution of 
the system 

l = /(r) and 4=g(r). (4.5) 

An easy calculation shows that f{x) is an increasing function of x and infinitely differen- 
tiable over M"^, and that g{x) has a unique minimum, which we shall denote by Xg. More- 
over, g{xg) = Xk/k, where A^ is defined in Theorem 14.71 We shall need the following technical 
claim. 

Claim 4.9. Xg<£,*. 

Proof. A simple calculation reveals that 

1 — e~^ — {k — l)xe~^ 



g'ix) 



k{l - e-'^Y 
The numerator of g'{xQ) is 

1 21og(fc-l) 
{k-lY k-l 
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which is easily checked to be greater than zero for all k > 3. Hence g'{xQ) > and thus 

Xg < XQ. 

In the remainder we argue that ^* > |, which settles the claim with xq = 2 log{k — 1) < |. 
Note that the monotonicity of / guarantees that it is enough to show that /(|) < 1. Using 
the estimate > 1 + x + x'^ /2, which is valid for all x > we obtain 

r(k\ _l e^/^ - 1 _ 1 / k \ ^ 1 / 4 

Note that for k > A this expression is < 1, thus concluding the proof in these cases. Finally, 
if /c = 3, then numerical calculations imply that ^* > 2.14 > 2 log 2. □ 

Let us assume that p = ckj , where c = (1 — e)c\, > Xk/k, and set ^ = xck, where x is 
the largest solution of the equation x = {l — e~^'^^)^~^. So, ^ is the largest solution of c = g{C), 
implying with the above claim that ^ < C*. Therefore, we have /(^) < 1, and Corollary 14.81 
guarantees that with probability 1 — n~'^^^^ the density of the core of H^^p^k is < 1. This 
argument can be extended to obtain the following finer statement. 

Corollary 4.10. Let 6 > be sufficiently small and choose c > Xk/k such that the largest 
solution ^ of the equation c = g{(,) satisfies £, = S,* — S, where ^* is as in Theorem \l.l\ Then 
there is an £ > such that e = @{S) and c = (1 — e)c|.. Moreover, there is a constant > 
such with probability 1 — n~^^^^ 

M2 = N2il-ek5 + ei6'^)). 

Proof. We first show that there is an e > with the claimed properties. Note that ^ is defined 
through the equation c = g{S,) and ^* through c^ = g{C)- Let Xg be the minimizer of g, i.e., 
g{xg) = Xk/k, and note that whenever x > Xg we have g'{x) > 0. By applying Taylor's 
Theorem and using (14. 5p we infer that there is a /x G ^*] such that 

c = giO = giC) + 5 - n = 4 - g'{^)6. 

So c = (1 — e)c^., where e = ^-^6, and note that g'{fi) remains bounded for i^i G 
whenever 6 is sufficiently small. 

To see the claim for M2, note that Corollarv 14.81 (where we use 6'^ for 6) guarantees that 
with probability 1 — n''^^^^ we may assume that M2 = (/(O ^ S'^)N2. Moreover, Taylor's 
Theorem, this time applied to /, implies that 

fio = fic) + nem - n + o{{^ - ef) = i - nn^ + o{s^), 

thus concluding the proof with = f and the fact that / is increasing. □ 

We immediately obtain the following corollary. 

Corollary 4.11. Let k > 3. Let e > be sufficiently small and suppose that p = {1 — 
e)clk/ {IzD- Then, with probability 1 - n^'^^^) 

M2 < il-£^)N2- 
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4.3 Subgraphs of the 2-Core 

In order to obtain tight bounds for the probabihty that there are (1 — e^)-dense subsets in 
the core of Hn^p^k^ we wih exploit the following statement. 

Proposition 4.12. Let H be a k-uniform hypergraph such that en < [(1 — 'y)vH\ ■ Moreover, 
let U be an inclusion maximal subset ofVu such that eu > (1 — "y)vu. Then eu = \{1 — 7)^(7] 
and all edges e G Eh satisfy \er\U\ ^ k — 1. 

Proof. If ejj > \{1 — ^)vu~\ then ej/ > (1 — 7)1^(7 + 1, and let U' = U U {v}, where v is any 
vertex in Vh \ U . Note that such a vertex always exists, as U ^ Vh- Moreover, denote by d 
the degree of v in [/, i.e., the number of edges in H that contain v and all other vertices only 
from U . Then 

e-u' = e-u + d > eu > {I - l)vu + 1- 
Note that viji = vjj + Hence, the above inequality implies that 

eu' > (1 - l){vu + 1) - (1 - 7) + 1 > (1 - iW, 

which contradicts the maximality of U . Similarly, if there was an edge e such that |e n f7| = 
— 1, then we could construct a larger subset of Vh that also satisfies the density requirement 
by adding the vertex in e \ f/ to U . □ 

Let 7 > 0. The following lemma bounds the probability that a given set of the core is 
maximal and (1 — 7)-dense, assuming that the degree sequence has been exposed. That is, the 
randomness is that of the 3rd stage of the exposure process in the Poisson cloning model. A 
similar statement was shown in [14] for the special case 7 = 0. 

Lemma 4.13. Let k >2, d = (di, . . . , d^) be a degree sequence and U C {1, . . . , A^} such that 
\U\ = IPN\ , where 1/2 < /? < 1. Moreover, set M = k-^ Yl^i di and q = {kM)-^ Y.ieU <^i- 
LetQ<^< 1/4 and assume that 3iV/4 < M < (1 - ^)N . If B{l3,q; 7) denotes the event 
that U is an inclusion maximal set of ILd.,k such that Cf/ > (1 — 7)|C/|, then 

^d,kW,q; 7)) < maxjl, I^Jl^^ | • (2^= - /fc - 1)^"^^ • e-^^^-^('') • eO(7log(i/7)^), 

where H{x) = —xlnx — (1 — x) ln(l — x) denotes the entropy function, and Pd,fc denotes the 
probability measure on the space of Stage 3, given the outcomes of the first two stages. 

Proof. The graph Hd,k is obtained by creating di clones for each 1 < i < and by choosing 
uniformly at random a perfect fc-matching on this set of clones. Note that this is the same 
as throwing kAI balls into M bins, such that every bin contains k balls. We use this analogy 
to prove the claim as follows. Assume that we color the kqM clones of the vertices in U with 
red, and the remaining k{l — q)M clones with blue. So, by applying Proposition 14.121 we are 
interested in the probability for the event that there are exactly [(1 — 7)1^^1] bins with k red 
balls and no bin that contains exactly one blue ball. 

We estimate the probability for this event as follows. We start by putting into each bin 
k black balls, labeled with the numbers 1, . . . , fc. Let /C = {1, . . . , A;}, and let Xi, . . . , Xm be 
independent random sets such that for \ < i < M 

V/C'C/C : ^{Xi = lC') =q\^'\{l-q)''-\^'\. 
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Note that \Xi\ is distributed like Bin(/c,g). We then recolor the bahs in the ith bin that 
are in Xi with red, and all others with blue. We infer that the total number of red balls is 
X = j:fi,\Xi\. Set 

Z = F{X = kqM) . 

Note that K{X) = kqM, and that X is distributed like Bm{kM,q). By applying Proposi- 
tion 113] we infer that 



Z = ¥{X = E (X)) = (1 + o(l))(2^g(l - q)kM)-^/^. 
Let Rj be the number of Xj's that contain j elements, and set 

P = F{X = kqM ARk= \{l - j)\U\-] A Rk^, = 0) . 
Let ejj = [(1 — 7)1^71] • -^y using the above notation we may estimate 

P iB{/3, q-j)) = ^< V2M ■F{X = kqM A Rk = eu A Rk^i = 0) . (4.6) 

Let pj = F {\Xi\ = j) = (J)g-'(l — q)^~^ . Moreover, define the set of integer sequences 
f 

A= < 



k~2 k-2 

{bo,..., 6fc-2) e N''~^ : = M - eu and ^ jbj = kqM -keu\ ■ 

j=o j=o 



Then 



{bo,...,bk-2)&A 

Observe that the summand can be rewritten as 



2; 



By applying the multinomial theorem we obtain the bound 

M-eu \''tt ff^^''' 



{6o,...,bfe-2)G.4 ^ ' - j=0 

Thus, from (|4.6p we infer that for large M 



'{B{P,q; 7))< 



The proof is completed by estimating (*^). More specifically, assume first that \U\ > (1— 7)M. 
Then 7 < 1/4 guarantees that 

^^"l < f < f^y"'' = eO(7log(l/7)-^) 

euj ~ \2-fMj - V27M J 
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Otherwise, let us write \ U\ = [/3N\ = t]M, for some appropriate r/ < 1 — 7. Note that /3 > 1/2 
and M < (1 — 7)A^ guarantee that r] > 1/2. By applying Proposition 14.41 we obtain 

M\ _ f M ^ < ^gA/7,)log(max{l/,,,l/(l-,,)}) ^ (^ ^^^gO(7log(l/7)Af)_ 

□ 



euj y{l --f)rjM J " y-qM J ^ ' \\U\ 



With the above lemma at hand we are ready to estimate the number of (1 — e'^)-dense 
sets in the core of Hn^p^k- Let us make some auxiliary preparations first. Suppose that the 
degree sequence of the core C is given hy d = {di, . . . , dj\[2)- Thus, the number of edges in C 
is M2 = fc-^ Xl^'i di- For G [0, 1] let Xq^,s = Xg^p{C) = Xg^p{d) denote the number of 
subsets of C with [/3A^2j vertices and total degree [q-kM2\. (We will omit writing "[.J" from 
now on.) Note that Xg^j^ is a random variable that depends only on the outcomes of the first 
two stages of the exposure of the core. Let also Yg^^ denote the number of these sets that are 
maximal and (1 — e^)-dense. 

Let S > 0. Moreover, let p = ck/ (^I^) be such that the largest solution ^ of the equation 
= c satisfies — where g{(,*) = c^. By applying Corollarv 14.101 we infer that 

there is a e = Q{6) such that c = (1 — e)c^. Moreover, Theorem 14.71 (where we use 6'^ for 6) 
guarantees that with probability 1 — n~^^^'^ 

N2 = n(l - e-« - Ce~^) ± S^n and Ac,fe = ^ ± 5^ where ^ = C - S. 

Set _^ 

n2 = (1 - - ^e~^)n and m2 = , } — 77^2 

k{l — e~? — t,e~^) 

and let A be the event 

A : N2 = n2± 5^n and M2 = m2 ± 8^n. (4.7) 

Corollarv 14.81 implies that P(^) = 1 — n^^^^\ Moreover, Corollarv 14.101 guarantees the 
existence of a Cfe > such that 

m2 = {I - ek5 + Q{5^))n2. (4.8) 

We shall assume all the above facts in the remainder. We are ready to prove the main result 
of this section, which addresses sets with more than 0.7A'^2 vertices. 

Lemma 4.14. Let /3 G [0.7, 1 - 6^(5/2] and let (3 < q < 1 - 2(1 - /3)/k. Then, for sufficiently 
small 5 > 

P(y,,;3>0)=n-«. 
Moreover, when q < P or q > 1 — 2(1 — 13) /k, the above probability is 0. 

Proof. The proof follows the arguments in [14j . see Corollary 4.5 - Lemma 4.11 there, so 
we only outline the relevant steps. First of all, suppose that we have exposed the degree 
sequence d of the core. Markov's inequality implies that 

¥{Yg^p > I d) < X,,;3(d)Pd,fc(^(/3,g; £')), 
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where B{(3, q; e^) is as in Lemma [4. 131 Note that for sufficiently small 6 > 0, (j4.8p . the upper 
bound on /3, and Proposition 14.41 imply 



By conditioning on A, taking expectations on both sides, and applying Lemma 14.131 we infer 
that (see also Lemma 4.6 in [Hj) for (5 > sufficiently small 

F {Yq,p > 0) < E {Xq^p I A) ■ {j^^ ^-{2^ -k- l)™2-/3'^2 . f,-km2-H{q)+0{5^n2) (4.9) 

The expectation of X^^p, given the event A^ is determined by calculating the probability that 
a specific set with (3N2 vertices has total degree qkM2- This task was performed in jl4] . see 
Lemma 4.10 there. We omit the details. It follows that 

E {Xq^f, I A) = exp (n2H{^) - n2(l - /?)/ (^^f^) + 0{n2S^)^ , 



where 

I{z) 



' z (hiT^ - InO - In (e"^- - - l) + In (e« - ^ - l) , if z > 2 
ln2- 21n^ + ln(e5 -C- 1), if z = 2 

T.(l-e-^^) 



and Tz is the unique solution of z = ^Lr^ ■ Let 

/(/3, q) := 2 /?(/3) + (1 - /3) ^(2^= _ - 1) - - (1 - /3)/ (^^^ 

By using (|4.9|) we infer that 

P {Yq,p > 0) < exp {n2 (7) + efc<5(A;i? (q) - ^(2^= - A: - 1)) + 0(5^)^ | + ^-..(i)^ 

In [13] the following was shown. 

Claim 4.15. There exists a C > such that for any small enough v > the following is 
true. Let 0.7 < P < 1 - u and P < q < 1 - 2{l - P)/k. Then 

f{P,q)<-Cv + 0{5^). 

We distinguish between the following cases. First, note that if 0.7 < /3 < 1 — then the 
above claim yields for sufficiently small 5 > 

P {Yq^p > 0) < e"2(-f^v^+0(5)) ^ ^-..(i) ^ 

Finally, if 1 — < /3 < 1 — 6^(5/2, then the above claim implies that there is a C > such 
that f{P,q) < C'6'^. Moreover, by the monotonicity of the entropy function and q > /3 we 
have for sufficiently small 5 > 

kH (q) - ln{2'' - k - 1) < kH{0.99) - ln(2'' -k - 1). 

A simple calculation and the fact H{0.99) < 0.06 show that the above expression is negative 
for all A; > 3. □ 
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This completes the proof of Theorem 14.21 for the case 0.7 < /3 < 1 — 6^(5/2. Now if 
/3 > 1 — efc5/2, then ()4.7p together with (j4.8p imply that for small S all larger subsets have 
density smaller than 1 — e'^. 

In order to cover also the remaining cases we use straightforward first moment arguments. 
We begin with the case k > 5. 

Lemma 4.16. Let k > 5,c < 1 and < 7 < 0.001. Then Hn^cn,k contains no (1 — ^)-dense 
subset with less than 0.7n vertices with probability at least 1 — 72-(i-7)'=^+2fc+i^ 

Proof. The probability that an edge of Hn,cn,k is contained completely in a subset U of the 
vertex set is ^ (^)''- Let ^ < n < 0.7. Then 



' (3 (1 — 7)-dense subset with un vertices) < 



n \ I cn \ ^ ^n{2H(u)+(l-^)ku\nu) 

un) \uni 



wh.eieH{x) = —x In x— (1 — x) In x denotes the entropy function. It can easily be seen that the 
exponent has a unique minimum with respect to u in [0,0.7], implying that it is maximized 
either at u = k/n or at u = 0.7. Note that 

2H{0.7) + (1 - 7)A;0.71n(0.7) < 2H{0.7) + (1 - 7)5 • 0.71n(0.7) < -0.01 

and that 

\n J n \n J n \n 

So, the maximum is obtained at u = k/n, and we conclude for large n the proof with 
P(3 (1 - 7)-dense subset with < 0.7n vertices) < ^ ^-{i-^)k^+2k < ^-{i-^)k^+2k+i _ 

k/n<u<0.7 

□ 

The cases k S {3, 4} are slightly more involved. There we will exploit Proposition I4.12[ 

Lemma 4.17. Let < 7 < 0.001. Let H be a k-graph, where k G {3,4} and call a set 
U cVh bad if 

eu = \{l - 7)|C/n and ^e ^ Eh : \er\U\ ^ k - I. 
Then, for any c < 0.95 and sufficiently large n 

¥ {Hn^cn,3 contains a bad subset U with < n/2 vertices) < n~^. 
and for any c < 0.98 and sufficiently large n 

¥ {Hn,cn,4 contains a bad subset U with < 3n/4 vertices) < n~'^. 
The proof is essentially the same as the proof of Lemma 4.2 in |14j . and thus omitted. 
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Proof of Theorem \4.^ Recall that it is sufficient to show that the core C of Hn^p^k contains 
with probability 1 — o(n~^/^) no maximal (1 — e^)-dense subsets (see also the discussion in 
Section [37II), where p = ck/{^z\)- 

First of all, recall that the above discussion implies that it is sufficient to consider (3 > 0.7. 
Let k > 5. By applying Lemma 14.161 we obtain that Hn^cn,k does not contain any (1 — e"^)- 
dense set with less that 0.7n vertices, and the same is true for Hn,p,k, by Proposition 14.51 and 
Theorem 14.61 In particular, C does not contain such a subset, and the proof is completed. 

The case A; = 3 requires slightly more work. Lemma [4.171 and the fact C3 < 0.95 guarantee 
that C has no subset 5 with < n/2 vertices such that es = [(1 — e^)|S|] , and there is no edge 
that contains precisely two vertices in that set. In other words, by using Prop osition 14.12] C 
does not contain a maximal (1 — e^)-dense set with n/2 vertices. However, we know that with 
probability 1 — o(l) 

iV2 = (1 - e-«* - Ce-^* ± 0{5))n, where 3 = ~ ^} . 

— 1 — ^* 

Numerical calculations imply that > 0.63n for any 6 that is small enough. So, C does 
not contain any maximal (1 — e'^)-dense subset with less than n/2 < N2/{2 ■ 0.63) < 0.77-/V2 
vertices. This completes the proof for A; = 3; the case k = A follows similarly by using the 
second part of the conclusion of Lemma |4.17| and the fact that c| < 0.98. □ 



5 Spanning properties of H*^j^ 
5.1 Proof of Proposition [2T4l 

Observe that any set S of edges of a /c-uniform hypergraph that spans a connected hypergraph 
can contain at most {k — 1)\S\ + 1 vertices. Proposition 12.41 states that with high probability 
every set of edges of H* ^ ^ that is not too large contains a number of vertices that almost 
matches this upper bound. We prove this using a first moment argument, providing an upper 
bound on the expected number of subsets of edges of size s that span at most t vertices. The 
proof is similar to that of Lemma 5 in [16], but suitably adjusted to our parameters. 

For ease of notation we write t = {k — l)s — 5. (Later we will set 5 = XgS, and 6 = 1, 
respectively.) The expected number of sets of edges of size s which span at most t vertices is 
at most 

m\fn\ ( t^y ( clneY (ne\t f t^' _ sks-5^*-i^s.s+S 



< n-*e"(4s-l)'({(: - l)s)'(fa)* = (^i) [cUk - Ije')' 



Let ^ > be such that (1 + ^)c^ = 1. To deduce the first claim we observe that for 6 = XgS 
the definition of Xg implies that 



4(fc-l)e' 



Taking the sum over all s in the required range, we deduce that the probability that there 
exists a set of edges of size s, where loglog??, < s < n/k, spanning at most (k — 1 — Xs)s 
vertices is O ((1 + ^)-i°gi°s") = o(l). 
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The proof of the second part follows the same lines, except that we use slightly cruder 
upper bounds. In particular, we bound c|, < 1. Setting 6 = 1, we deduce from the formula 
above that the expected number of sets of edges of size s < log log n which span at most 
t = {k — l)s — 1 vertices is at most 

fcloglogn / ,\loglogn 

/ceM =ol. 

n V / 

5.2 Proof of Lemma 12.51 

We will use the following auxiliary fact. 

Proposition 5.1. For any constants a,b > we have that whenever D = D{a,h) is suffi- 
ciently large then 

n (l - > r"/' • {bOr''/' ■ e~vo for all j > 2/6. 

1 = 1 ^ ^ 

2 

Proof. Assume that D > 2 is large enough so that < 0.5. As 1 — a; > e~^'~^ for x < 0.5 
we thus obtain that 



ih + DJ- "\ ^ih + D ^\ih + D 

1=1 \ 1=1 1=1 



Further, we have 

3 



Similarly, 



i=l 



( ^ n ) - ° /„ (h^ ^ m2 



a2 /I 1 \ a2 

< 



ih + DJ - Jq {hx + DY h \D bj + D J - bD 



Now observe that for j > 2/b we have bj + D < bjD and thus log(6j + D) < log(bj) + log(-D). 
The substitution of these two bounds into (15. ip thus yields 



2=1 



□ 



Let H = H(y, E) be a A;-graph on n vertices having Property £. We also fix a vertex 
V gV and an orientation h of the edges, and for alH > we let Si be the number of vertices 
that are within /i-distance at most i from v. Note that = Nh^i{v), but we shall be using 
this symbol throughout this section to avoid an unnecessary notational burden. If all vertices 
within /i-distance at most i from v are occupied, then Property £ implies that 

( {k-l- Xsi)si, ifsi>loglogn 
Si+i > < • (5.zj 

\{k-l)si, if Sj < log log n. 
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Claim 5.2. Let iq := min{i : Sj > log log n}. Then iq < log^,_;^ log log n + 1. 

Proof. Observe that for all 2 < z < zq, we have Si > {k — l)si-i > ■ ■ ■ > {k— l)*~^si = (k — iy, 
and the claim follows. □ 

Claim 5.3. Let t^ := [(1 — e) log;,.^ nj . Then there exists a dk > such that whenever e > 
is sufficiently small and n is sufficiently large we have 



Proof. Observe that by (j2.ip we have st^ < — l)*^"*"^ < — 1)?^^ ^- Let be defined as 

logfc((fc-l)e'=) 
elogfe'"-logfc(fc-l)-l 



in the previous claim. Hence, we have for all i^ < i < t^ that Xs^ < — l-zf ^ 



^^"^eiogfc ^ ' ^^-^y sufficiently large. Thus, for all such i the first part of (15. 2p yields 

> (A. - 1 - X,).. > - i)(i - 

V (fe — Ijelogfc re / 

Set (/> = <f){e,k,n) = 1 — '^^(^^i)^iog'^n ■ ^.pplying the above estimate repeatedly and using 
Claim [5^2] we obtain 

St, >{k- l)*--*0(/.*^-^"sio > {k - i)*.-iogfe-iiogiog"<^iog*-i" . log log n > ^^,/,i°s*-i". 

k 1 

Also, since (^k_i^1og(k-i) < 1 A; > 3, we obtain that for re large enough and for all A: > 3 

we have (l - ^^^^'i^^V^^"'^ V"^"'' " > e-^^~'^°sMk-^)^'') , Thus, if e > is small enough, 

\^ (fc-l)£logfcn J — ' & ' 

p-2e-ilogfc((fc-l)e'=) 

5 > n^-^- > re^-^e-^^ ^ logfc{(fc-i)e'=) 

A: - 1 

□ 

Claim 5.4. Let t > io. For every e sufficiently small, if st < ere, then for all < i < t — io 
(where io is as defined in Claim fXlj] . we have 

^ log,.((fc-l)e^) ^ 

"^^-^ - ilog,(A: - 1 - 7) + logfc(l/e) - 1 - ^' 

Proof. We will show the statement by induction on i. For i = this is obtained directly from 
the definition of Xs^ ■ 



^ logfc((fc - l)e^) ^ logfc((fe - l)e^) ^ 
"^^^ log,(re/.i)-l - logfc(l/e)-l 

In general, we have 

st^i ^ st-i ^ St ^ en 



^t-{i+i) ^ - A;-l-7-(fc-l - 7)^+1 - (fc - 1 - 7)^+1 ' 



Thus the definition of yields 

logfc((fc - l)e^) ^ log,((fc-l)e^) 

""^^-^-^ logfc(re/si„,„i) - 1 - (i + 1) log,(/c - 1 - 7) + logfc(lA) - 1 



□ 
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Claim 5.5. For every k > 3 and for every ^ > 0, e = e{C, k) > sufficiently small and n suffi- 
ciently large, if all vertices within h-distance T := log/j_i n+ ^ (A;'^i{iog^,^(fcli) + logfc-i ^^Sk-i 
from V are occupied, then st > £n. 

Proof. We prove the claim by contradiction. Assume that st < £ri. Claim EH] imphes that 

logfc((A: - l)e^) 

"''"-^ - aog,(A:-l-7)+logfe(l/e)-l 



for all < i < T — -io and the first part of (|5.2p thus implies that 

^ en ^ en 1 

^^"^ - n^' (k-l-x, ) ~ {k - ly L iogfc((fc-i)efe)/(fc^i) \ • (5.3) 

lh=l['^ ^sr-J V I 1L=1 - ilog,(fc-l-^)+log,(l/e)-lj 

for all < J < T — ig- We now apply Proposition 15.11 for a = log^.((A; — l)e^)/{k — 1) and 
b = logj^{k — 1 — 7). Then Proposition 15.11 implies such that whenever e is sufficiently small 
(such that log;,(l/e) — 1 > D, where D = D{a, b) is as defined in Proposition 15. ip . then 

J-r ^ los:i.((k - l)e'')/(k - 1) \ iogfc((fc-i)efc) i 

l\\ ilog,(fc-l-7)+logfc(l/e)-ly' Q,,' 

where C^^fc is an appropriately defined constant depending only on e and k. Then substituting 
the lower bound from (j5.4p into (j5.3p we obtain that for all j < log;,.]^ n we have 

£fl logfc((fc-l)e'') 



ST-, < j^—^ (logfc.i n) e^-Dio^.c^-i-.) (5.5) 

If we now set Rgr^k ■= C'e.A; " e"'*/^, where is the constant from Claim [53} we deduce that 
for j := elogfc„i n + (fciT)togJfc-i-7) ^°Sfc-i logfc_i n + logfc„i Re,k we have 

ST-j < ee"'^'=/^n^"^ (5.6) 

If e is small enough, then in turn 7 is small enough so that for n sufficiently large T — j > 
[(1 — e)logf,_in\; this, however, contradicts the lower bound from Claim [5^31 □ 



Note that Claim 15.51 completes the proof of Lemma 12.5 



5.3 Proof of Corollary [2761 

By Proposition 12.41 H still has Property £ with n instead of = |y(ff)|. Using this, the 
proof of Lemma 12.61 follows exactly along the lines of the proof of Lemma 12.51 



6 Conclusion 

The main result of this paper asserts that for all > 3 the random insertion algorithm succeeds 
in polylogarithmic time with high probability, for any number of inserted items arbitrarily 
close to the theoretically achievable load threshold for cuckoo hashing. In particular, we 
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showed that the maximum number of steps performed by the algorithm is at most log*^^ '^'^ n 
with high probabiHty, where c{k) = 2 + + O(^) and C > is arbitrary. 

One of the main ingredients of our proof is a precise description of the expansion properties 
of an associated random hypergraph, from which we were able to derive conclusions regarding 
the structure of any orientation of its edges. In particular, our analysis implies that there 
might be vertices that are far away from any free vertex, that is, a vertex such that no edge is 
oriented to it. In turn, this implies that O(logn) steps are required to reach a free vertex, if 
the initial insertion takes place in such a vertex. This immediately also implies a logarithmic 
lower bound on the maximum insertion time. 

Clearly, if we need r2(logn) steps in the best case, then a random algorithm is likely to 
miss this best case. In our analysis, we thus divide the insertion process into phases: each 
phase lasts O(logn) steps, and it either ends successfully at a free vertex, or it fails. In 
the latter case, we assume that the worst-case has occurred, namely that we start again at 
the furthest possible vertex. This clearly need not be the case (and it is probably not), but 
currently our methods do not seem strong enough to provide a way to analyze this. 
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